3. Returning Just the Top Level
The last variation on GROUP BY is the GROUPING SETS operator introduced in SQL Server 2008. This operator returns just
the top-level rollup rows for each grouping level and does not include
the actual group level summary information that was returned by earlier
versions of the query, as follows:
SELECT Store, Item, Color, SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY GROUPING SETS (Store, Item, Color)
ORDER BY Store, Item, Color
GO
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NULL NULL Blue 347
NULL NULL Green 535
NULL NULL Red 390
NULL Chair NULL 290
NULL Sofa NULL 2
NULL Table NULL 980
NJ NULL NULL 190
NY NULL NULL 504
PA NULL NULL 578
(9 row(s) affected)
GROUPING SETS is merely another variation on GROUP BY
that you can use when you require only top-level rollups for each of
your grouping levels (that is, one set of group rollups per level). In
this case, you get a total quantity report for all colors, all items,
and all store locations without including the summary rows for each of
the combinations of grouping levels, just as earlier versions of the
query did.
But the GROUPING SETS story doesn’t end here, of course. Unlike WITH ROLLUP and WITH CUBE—which are mutually exclusive in the same query—rollup and cube operations can be used together and with GROUPING SETS
in any combination. This means that you can compose one query that
returns only top-level rollups for certain grouping levels and also
returns the lower-level rollups and summary rows for other grouping
levels, just like you get using WITH ROLLUP or WITH CUBE in separate queries.
To achieve this, SQL Server provides an alternative syntax for WITH ROLLUP and WITH CUBE that makes these operators capable of being expressed with one another in the same GROUP BY clause. This syntax is actually quite simple: drop the WITH keyword, place the ROLLUP or CUBE keyword before the grouping columns rather than after, and enclose the grouping columns in parentheses.
For example, the following two GROUP BY clauses are interchangeable:
GROUP BY Item, Color WITH ROLLUP
GROUP BY ROLLUP(Item, Color)
Similarly, these two clauses are also equivalent:
GROUP BY Item, Color WITH CUBE
GROUP BY CUBE(Item, Color)
Although the two versions are interchangeable when used on their own, you must use the newer syntax if you want to combine them with one another or with GROUPING SETS in a single query. Here is another version of the inventory query that does just that:
SELECT Store, Item, Color, SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY GROUPING SETS(Store), CUBE(Item, Color)
ORDER BY Store, Item, Color
The GROUP BY clause in this query includes both a GROUPING SETS operator on Store and a CUBE operator on Item and Color. This tells SQL Server to return top-level rollups only on the Store column and full summaries with multidimensional rollups on the Item and Color columns. Here are the results:
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NJ NULL NULL 190
NJ NULL Blue 122
NJ NULL Green 2
NJ NULL Red 66
NJ Chair NULL 32
NJ Chair Blue 22
NJ Chair Red 10
NJ Sofa NULL 2
NJ Sofa Green 2
NJ Table NULL 156
NJ Table Blue 100
NJ Table Red 56
NY NULL NULL 504
NY NULL Blue 225
NY NULL Green 229
NY NULL Red 50
NY Chair NULL 122
NY Chair Blue 101
NY Chair Red 21
NY Table NULL 382
NY Table Blue 124
NY Table Green 229
NY Table Red 29
PA NULL NULL 578
PA NULL Green 304
PA NULL Red 274
PA Chair NULL 136
PA Chair Red 136
PA Table NULL 442
PA Table Green 304
PA Table Red 138
(31 row(s) affected)
The rows with NULL values for both Item and Color (highlighted here in bold) are the top-level rollups for Store returned by the GROUPING SETS(Store)
operator. These rows report just the totals for each store (all items,
all colors). All of the other rows are the multidimensional rollup and
summary results returned by CUBE(Item, Color). These rows report aggregations for every combination of Item and Color. Because Store is returned by GROUPING SETS and not by CUBE, you don’t see combinations that include all stores.
You can use GROUPING SETS, ROLLUP, and CUBE in any combination you want with the GROUP BY
clause. As a result, you gain tremendous flexibility for grouping,
aggregating, and analyzing your data just the way you need to. The only
restriction in usage is the same one that applies when using GROUP BY on its own: columns returned by the query must be specified either in the GROUP BY clause (in any of the GROUPING SETS, ROLLUP, or CUBE operators) or in an aggregate function that operates across all the combined rows for the group (such as SUM, COUNT, MIN, MAX, and so on).
We’ll conclude the discussion of GROUPING SETS by discussing NULL values. As you’ve seen, SQL Server returns NULL
values to represent all values in high-level rollup rows. If you’re
fortunate enough to be working with data that is guaranteed not to
contain NULL values, life is good for you. But this is far more often not the case, and thus a problem arises distinguishing between “real” NULL values and the NULL values representing “all values” in rollup rows.
To demonstrate, add two more rows to the Inventory table for lamps that have no color association. These rows store NULL values in the Color column, as shown in Example 3.
Example 3. Introducing NULL values into the Inventory table.
INSERT INTO Inventory VALUES('NY', 'Lamp', NULL
, 36)
INSERT INTO Inventory VALUES('NJ', 'Lamp', NULL
, 8)
Now run the exact same query you ran before:
SELECT Store, Item, Color, SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY GROUPING SETS(Store), CUBE(Item, Color)
ORDER BY Store, Item, Color
GO
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NJ NULL NULL 8
NJ NULL NULL 198
NJ NULL Blue 122
NJ NULL Green 2
:
NJ Table Blue 100
NJ Table Red 56
NY NULL NULL 36
NY NULL NULL 540
NY NULL Blue 225
NY NULL Green 229
:
PA Table Green 304
PA Table Red 138
(37 row(s) affected)
These are very confusing results. Because both the “all colors” rollup columns and the lamp columns with “no color” have a NULL value for Color,
it is impossible to distinguish between the two when analyzing the
query results. For example, the first row returns the rollup for all
items with no color in NJ (that’s the 8 lamps), and the second row returns the rollup for all items in all colors in NJ, but there is no way to discern that difference because NULL
is used to represent both “no color” and “all colors.” The same problem
occurs again further down in the results for NY, where there are also
colorless lamps in stock. Once again, because “no color” and “all
colors” are both represented by NULL values, the results are nothing short of perplexing.
The solution to this problem is to use the GROUPING function in your query. The GROUPING function returns a bit value of 1 (true) if the column passed to it represents an “all values” rollup, and it returns 0 (false) otherwise. It is therefore possible to distinguish between “all values” rollup columns (which are always NULL) and regular data (which might be NULL, as is the case for the lamps, which have no color values). Here is a revised version of the query that uses the GROUPING function in conjunction with CASE to produce a better result set that clears up the confusion between “all values” and “no value”.
SELECT
CASE WHEN GROUPING(Store) = 1
THEN '(all)' ELSE Store END AS Store,
CASE WHEN GROUPING(Item) = 1
THEN '(all)' ELSE Item END AS Item,
CASE WHEN GROUPING(Color) = 1
THEN '(all)' ELSE Color END AS Color,
SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY GROUPING SETS(Store), CUBE(Item, Color)
ORDER BY Store, Item, Color
The CASE construct tests each grouping column returned by the query using the GROUPING function. If it returns 1 (true), that means that the column represents an “all values” rollup. In this case, the string (all) is returned, rather than the NULL value that would have otherwise been returned. If it returns 0 (false), the column contains regular data, which might or might not be NULL. Although there are NULL values only in for the Color column for lamps, apply the same CASE and GROUPING to the Store and Item columns as well. This is a defensive coding measure against the possibility of the Store or Item column also containing NULL values in the future. Taking this approach now resolves the confusion with respect to NULL values and rollups in the query results, as shown here:
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NJ (all) NULL 8
NJ (all) (all) 198
NJ (all) Blue 122
NJ (all) Green 2
:
NJ Table Blue 100
NJ Table Red 56
NY (all) NULL 36
NY (all) (all) 540
NY (all) Blue 225
NY (all) Green 229
:
PA Table Green 304
PA Table Red 138
(37 row(s) affected)
It’s perfectly understandable now that the
first row returns the rollup for all items with no color in NJ, whereas
the second row returns the rollup for all items in all colors (including
no color) in NJ. The same is true farther down in the NY results, where
there are also colorless lamps in stock. Therefore, to avoid any
potential confusion concerning NULL values in your grouping queries, you should always use the GROUPING function in this manner to translate the NULL values that mean “all values” for the user.
As
long as you’re modifying the query to produce more readable results,
enhance it one more time. In the same way that you translated the NULL for “all values” to the text (all), you can translate the NULL values for regular “missing” data to (n/a). This is easy to do by adding an ELSE clause to the CASE construct that uses the ISNULL function on the column, as shown here:
SELECT
CASE WHEN GROUPING(Store) = 1 THEN '(all)'
ELSE ISNULL(Store, '(n/a)')
END AS Store,
CASE WHEN GROUPING(Item) = 1 THEN '(all)'
ELSE ISNULL(Item, '(n/a)')
END AS Item,
CASE WHEN GROUPING(Color) = 1 THEN '(all)'
ELSE ISNULL(Color, '(n/a)')
END AS Color,
SUM(Quantity) AS TotalQty
FROM Inventory
GROUP BY GROUPING SETS(Store), CUBE(Item, Color)
ORDER BY Store, Item, Color
The ELSE clause in each CASE construct runs if the GROUPING function returns 0
(false). This means that the column is not an “all values” rollup, but
regular column data. You want regular column data to be returned as is,
except for NULL values that should be returned as the string (n/a). The ISNULL function tests for NULL values and performs the translation on them, as shown in the results returned by the query:
Store Item Color TotalQty
----- -------------------- ---------- ------------------------------
NJ (all) (all) 198
NJ (all) (n/a) 8
NJ (all) Blue 122
NJ (all) Green 2
:
NJ Sofa Green 2
NJ Stool (all) 8
NJ Stool (n/a) 8
NJ Table (all) 156
NJ Table Blue 100
NJ Table Red 56
NY (all) (all) 540
NY (all) (n/a) 36
NY (all) Blue 225
:
NY Chair Red 21
NY Stool (all) 36
NY Stool (n/a) 36
NY Table (all) 382
NY Table Blue 124
:
PA Table Green 304
PA Table Red 138
(37 row(s) affected)
The query now returns no NULL values at all, which is much better for your users who don’t really know or care exactly what NULL means anyway. By translating these values appropriately to (all) and (n/a), you have produced a far more usable report for them.